Scatterplot We created a scatterplot of the location (latitude and longitude) of the shootings in the five boroughs of NYC. We overlaid this onto a map of NYC using the leaflet library. Using plotly, we were also able to give information on the month, year, borough, and the category of location (private house, public housing, restaurant, etc) in the text box that appears when hovering over the plot.
nypd_shooting_df =
read_csv("data/nypd_shooting_data.csv") %>%
janitor::clean_names() %>%
separate(col = occur_date, into = c("month", "day", "year"), sep = "/") %>%
separate(col = occur_time, into = c("hour", "minute", "second"), sep = ":") %>%
mutate(across(where(is.character), tolower),
month = as.numeric(month),
month_name = recode(month, "1" = "january", "2" = "february", "3" = "march", "4" = "april", "5" = "may", "6" = "june", "7" = "july", "8" = "august", "9" = "september", "10" = "october", "11" = "november", "12" = "december"),
day = as.numeric(day),
year = as.numeric(year),
hour = as.numeric(hour),
minute = as.numeric(minute),
second = as.numeric(second),
minute_calc = hour * 60 + minute,
boro = as.factor(boro),
boro = fct_relevel(boro, "manhattan", "brooklyn", "bronx", "queens", "staten island")) %>%
select(incident_key, year, month_name, month, day, hour, minute, second, minute_calc, everything())
## Rows: 25596 Columns: 19
## ── Column specification ────────────────────────────────────
## Delimiter: ","
## chr (10): OCCUR_DATE, BORO, LOCATION_DESC, PERP_AGE_GROUP, PERP_SEX, PERP_R...
## dbl (7): INCIDENT_KEY, PRECINCT, JURISDICTION_CODE, X_COORD_CD, Y_COORD_CD...
## lgl (1): STATISTICAL_MURDER_FLAG
## time (1): OCCUR_TIME
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
pal <- colorFactor("viridis", nypd_shooting_df$year)
nypd_shooting_df %>%
mutate(
text_label = str_c(month_name, " ", year, ", ", boro, ", ", location_desc)) %>%
leaflet() %>%
addTiles() %>%
addProviderTiles(providers$CartoDB.Positron) %>%
addCircleMarkers(lat = ~latitude, lng = ~longitude, radius = .1, color = ~pal(year), label = ~text_label) %>%
addLegend("bottomright", pal = pal, values = ~year,
title = "year")
<<<<<<< HEAD
=======
>>>>>>> db403ad (updated to overview/demographics RMD file)
Cluster map We further explored the map by identifying hotspots of shootings within each region. These are indicated in the cluster map below. When you zoom in on the map, the clusters in each specific region become more granular. Go ahead, give it a try.
nypd_shooting_df %>%
leaflet() %>%
addTiles() %>%
addProviderTiles(providers$CartoDB.Positron) %>%
addCircleMarkers(lat = ~latitude, lng = ~longitude, radius = .25) %>%
addMarkers(
clusterOptions = markerClusterOptions())
## Assuming "longitude" and "latitude" are longitude and latitude, respectively
<<<<<<< HEAD
=======
>>>>>>> db403ad (updated to overview/demographics RMD file)
Shootings by location type We wanted to see which location type was most prone to shootings? The map below shows that the following were the most common locations for shootings: multi-dwelling public housing and apartment buildings, private houses, grocery stores/bodegas, and bars/night clubs.
nypd_shooting_df %>%
count(location_desc) %>%
mutate(location_desc = fct_reorder(location_desc, n)) %>%
na.omit%>%
plot_ly(x = ~location_desc, y = ~n, color = ~location_desc, type = "bar", colors = "viridis")
<<<<<<< HEAD
=======
>>>>>>> db403ad (updated to overview/demographics RMD file)
Top shooting locations by borough We wanted to further explore the top 5 shooting locations by borough. In all boroughs, multi-dwelling public housing sites had the highest proportion of shootings, with multi-dwelling apartment buildings had the second highest proportion of shootings. Bars and nightclubs more commonly had shooting incidents in Manhattan, Bronx, and Queens, but not in Brooklyn or Staten Island.
nypd_shooting_df %>%
na_if("none") %>%
group_by(boro, location_desc) %>%
summarise(
count = n()) %>%
mutate(
percentage = count / sum(count) * 100) %>%
arrange(desc(percentage)) %>%
slice(1:5) %>%
drop_na(location_desc) %>%
ggplot(aes(fill = location_desc, y = count, x = boro)) +
geom_bar(position = "stack", stat = "identity") +
labs(title = "Top 5 shooting locations by boro")
## `summarise()` has grouped output by 'boro'. You can
## override using the `.groups` argument.
